NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks

نویسندگان

Petri Kontkanen

Hannes Wettig

Petri Myllymäki

چکیده

Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically well-founded, general framework for performing statistical inference. The mathematical formalization of MDL is based on the normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. In the case of discrete data, straightforward computation of the NML distribution requires exponential time with respect to the sample size, since the definition involves a sum over all the possible data samples of a fixed size. In this paper, we first review some existing algorithms for efficient NML computation in the case of multinomial and naive Bayes model families. Then we proceed by extending these algorithms to more complex, tree-structured Bayesian networks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Calculating the Nml Distribution for Tree-structured Bayesian Networks

We are interested in model class selection. We want to compute a criterion which, given two competing model classes, chooses the better one. When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the most commonly used model (class) selection criterion is the marginal likelihood, which is obtained...

متن کامل

Parent Assignment Is Hard for the MDL, AIC, and NML Costs

Several hardness results are presented for the parent assignment problem: Given m observations of n attributes x1, . . . , xn, find the best parents for xn, that is, a subset of the preceding attributes so as to minimize a fixed cost function. This attribute or feature selection task plays an important role, e.g., in structure learning in Bayesian networks, yet little is known about its computa...

متن کامل

Efficient Computation of NML for Bayesian Networks

Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant and which are not. The normalized maximum likelihood (NML) distribution or code offers an inform...

متن کامل

Revisiting enumerative two-part crude MDL for Bernoulli and multinomial distributions (Extended version)

We exploit the Minimum Description Length (MDL) principle as a model selection technique for Bernoulli distributions and compare several types of MDL codes. We first present a simplistic crude two-part MDL code and a Normalized Maximum Likelihood (NML) code. We then focus on the enumerative two-part crude MDL code, suggest a Bayesian interpretation for finite size data samples, and exhibit a st...

متن کامل

Calculating the Normalized Maximum Likelihood Distribution for Bayesian Forests

When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the most commonly used model (class) selection criterion is the marginal likelihood, which is obtained by integrating over a prior distribution for the model parameters. However, the problem of determining a reasonable prior for the parameters...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2007 شماره

صفحات -

تاریخ انتشار 2007

NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks

نویسندگان

چکیده

منابع مشابه

Calculating the Nml Distribution for Tree-structured Bayesian Networks

Parent Assignment Is Hard for the MDL, AIC, and NML Costs

Efficient Computation of NML for Bayesian Networks

Revisiting enumerative two-part crude MDL for Bernoulli and multinomial distributions (Extended version)

Calculating the Normalized Maximum Likelihood Distribution for Bayesian Forests

عنوان ژورنال:

اشتراک گذاری